Generating stressed speech from neutral speech using a mod CELP vocoder ’ ified Sahar
نویسندگان
چکیده
The problem of speech modeling for generating stressed speech using a source generator framework is addressed in this paper. In general, stress in this context refers to emotional or task induced speaking conditions. Throughout this particular study, the focus will be limited to speech under angry, loud and Lombard effect (i.e., speech produced in noise) speaking conditions. Source generator theory was originally developed for equalization of speech under stress for robust recognition (Hansen, 1993, 1994). It was later used for simulated stressed training token generation for improved recognition (Bou-Ghazale, 1993; Bou-Gh aza e and Hansen, 1994). The objective here is to generate stressed perturbed speech from 1 neutral speech using a source generator framework previously employed for stressed speech recognition. The approach is based on (i) developing a mathematical model that provides a means for representing the change in speech production under stressed conditions for perturbation, and (ii> employing this framework in an isolated word speech processing system to produce emotional/stressed perturbed speech from neutral speech. A stress perturbation algorithm is formulated based on a CELP (code-excited linear prediction) speech synthesis structure. The algorithm is evaluated using four different speech feature perturbation sets. The stressed speech parameter evaluations from this study revealed that pitch is capable of reflecting the emotional state of the speaker, while formant information alone is not as good a correlate of stress. However, the combination of formant location, pitch and gain information proved to be the most reliable indicator of emotional stress under a CELP speech model. Results from formal listener evaluations of the generated stressed speech show successful classification rates of 87% for angry speech, 75% for Lombard effect speech and 92% for loud speech.
منابع مشابه
Using FFI Interpolator and VQ Quantization for Designing of High Quality 1200 BPS Speech Vocoder
Storaging or transmission of speech signals at very low bit rate is a hot area in the field of speech processing. We used stochastic inter-frame interpolators and vector quantization (VQ) as a new method for developing a high quality 1200 BPS speech vocoder. The objective and subjecgtive test results show that performance of the new vocoder is compairable with 4800 BPS standard vocoders (as CELP).
متن کاملUsing FFI Interpolator and VQ Quantization for Designing of High Quality 1200 BPS Speech Vocoder
Storaging or transmission of speech signals at very low bit rate is a hot area in the field of speech processing. We used stochastic inter-frame interpolators and vector quantization (VQ) as a new method for developing a high quality 1200 BPS speech vocoder. The objective and subjecgtive test results show that performance of the new vocoder is compairable with 4800 BPS standard vocoders (as CELP).
متن کاملA novel training approach for improving speech recognition under adverse stressful conditions
This paper presents a new training approach for improving recognition of speech under emotional and environmental stress. The proposed approach consists of training a speech recognizer with synthetically generated speech under each stress condition using stress perturbation models previously formulated in [4, 1]. The perturbation models were previously formulated to statistically model the para...
متن کاملSynthesis of stressed speech from isolated neutral speech using HMM-based models
In this study, a novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using Hidden Markov Models (HMMs). While HMMs have traditionally been used for recog...
متن کاملA novel rate selection algorithm for transcoding CELP-type codec and SMV
In this paper, we propose an efficient rate selection algorithm that can be used to transcode speech encoded by any code excited linear prediction (CELP)-type codec into a format compatible with selectable mode vocoder (SMV) via direct parameter transformation. The proposed algorithm performs rate selection using the CELP parameters. Simulation results show that while maintaining similar overal...
متن کامل